OcrV1, Main, Exploration, bibRecord, 000183

Text Detection in Chart Images

Identifieur interne : 000183 ( Main/Exploration ); précédent : 000182; suivant : 000184

Text Detection in Chart Images

Auteurs : N. Vassilieva [Russie] ; Y. Fomina [Russie]

Source :

Pattern recognition and image analysis [ 1054-6618 ] ; 2013.

RBID : Pascal:14-0004028

Descripteurs français

Pascal (Inist)
- Recherche information, Texte, Reconnaissance optique caractère, Reconnaissance caractère, Chaîne caractère, Reconnaissance forme, Localisation, Langage XML, Résultat expérimental, Graphe connexe, Transformation Hough, ..

English descriptors

KwdEn :
- Character recognition, Character string, Connected graph, Experimental result, Hough transformation, Information retrieval, Localization, Optical character recognition, Pattern recognition, Text, XML language.

Abstract

Common OCR (Optical Character Recognition) systems fail to detect and recognize small text strings of few characters, in particular when a text line is not horizontal. Such text regions are typical for chart images. In this paper we present an algorithm that is able to detect small text regions regardless of string orientation and font size or style. We propose to use this algorithm as a preprocessing step for text recognition with a common OCR engine. According to our experimental results, one can get up to 20 times better text recognition rate, and 15 times higher text recognition precision when the proposed algorithm is used to detect text location, size and orientation, before using an OCR system. Experiments have been performed on a benchmark set of 1000 chart images created with the XML/SWF Chart tool, which contain about 14000 text regions in total.

Affiliations:

Russie

Links toward previous steps (curation, corpus...)

to stream PascalFrancis, to step Corpus: 000032
to stream PascalFrancis, to step Curation: 000732
to stream PascalFrancis, to step Checkpoint: 000026
to stream Main, to step Merge: 000186
to stream Main, to step Curation: 000183

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en" level="a">Text Detection in Chart Images</title>
<author><name sortKey="Vassilieva, N" sort="Vassilieva, N" uniqKey="Vassilieva N" first="N." last="Vassilieva">N. Vassilieva</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>HP Labs, I Antillorijskaya str.</s1>
<s2>St. Petersburg, 191104</s2>
<s3>RUS</s3>
<sZ>1 aut.</sZ>
</inist:fA14>
<country>Russie</country>
<wicri:noRegion>St. Petersburg, 191104</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Fomina, Y" sort="Fomina, Y" uniqKey="Fomina Y" first="Y." last="Fomina">Y. Fomina</name>
<affiliation wicri:level="1"><inist:fA14 i1="02"><s1>Studio Mobile, 18A Bolshoy Prospekt</s1>
<s2>St. Petersburg, 197198</s2>
<s3>RUS</s3>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Russie</country>
<wicri:noRegion>St. Petersburg, 197198</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">14-0004028</idno>
<date when="2013">2013</date>
<idno type="stanalyst">PASCAL 14-0004028 INIST</idno>
<idno type="RBID">Pascal:14-0004028</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000032</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000732</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000026</idno>
<idno type="wicri:doubleKey">1054-6618:2013:Vassilieva N:text:detection:in</idno>
<idno type="wicri:Area/Main/Merge">000186</idno>
<idno type="wicri:Area/Main/Curation">000183</idno>
<idno type="wicri:Area/Main/Exploration">000183</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a">Text Detection in Chart Images</title>
<author><name sortKey="Vassilieva, N" sort="Vassilieva, N" uniqKey="Vassilieva N" first="N." last="Vassilieva">N. Vassilieva</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>HP Labs, I Antillorijskaya str.</s1>
<s2>St. Petersburg, 191104</s2>
<s3>RUS</s3>
<sZ>1 aut.</sZ>
</inist:fA14>
<country>Russie</country>
<wicri:noRegion>St. Petersburg, 191104</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Fomina, Y" sort="Fomina, Y" uniqKey="Fomina Y" first="Y." last="Fomina">Y. Fomina</name>
<affiliation wicri:level="1"><inist:fA14 i1="02"><s1>Studio Mobile, 18A Bolshoy Prospekt</s1>
<s2>St. Petersburg, 197198</s2>
<s3>RUS</s3>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Russie</country>
<wicri:noRegion>St. Petersburg, 197198</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">Pattern recognition and image analysis</title>
<title level="j" type="abbreviated">Pattern recognit. image anal.</title>
<idno type="ISSN">1054-6618</idno>
<imprint><date when="2013">2013</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">Pattern recognition and image analysis</title>
<title level="j" type="abbreviated">Pattern recognit. image anal.</title>
<idno type="ISSN">1054-6618</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Character recognition</term>
<term>Character string</term>
<term>Connected graph</term>
<term>Experimental result</term>
<term>Hough transformation</term>
<term>Information retrieval</term>
<term>Localization</term>
<term>Optical character recognition</term>
<term>Pattern recognition</term>
<term>Text</term>
<term>XML language</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Recherche information</term>
<term>Texte</term>
<term>Reconnaissance optique caractère</term>
<term>Reconnaissance caractère</term>
<term>Chaîne caractère</term>
<term>Reconnaissance forme</term>
<term>Localisation</term>
<term>Langage XML</term>
<term>Résultat expérimental</term>
<term>Graphe connexe</term>
<term>Transformation Hough</term>
<term>.</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Common OCR (Optical Character Recognition) systems fail to detect and recognize small text strings of few characters, in particular when a text line is not horizontal. Such text regions are typical for chart images. In this paper we present an algorithm that is able to detect small text regions regardless of string orientation and font size or style. We propose to use this algorithm as a preprocessing step for text recognition with a common OCR engine. According to our experimental results, one can get up to 20 times better text recognition rate, and 15 times higher text recognition precision when the proposed algorithm is used to detect text location, size and orientation, before using an OCR system. Experiments have been performed on a benchmark set of 1000 chart images created with the XML/SWF Chart tool, which contain about 14000 text regions in total.</div>
</front>
</TEI>
<affiliations><list><country><li>Russie</li>
</country>
</list>
<tree><country name="Russie"><noRegion><name sortKey="Vassilieva, N" sort="Vassilieva, N" uniqKey="Vassilieva N" first="N." last="Vassilieva">N. Vassilieva</name>
</noRegion>
<name sortKey="Fomina, Y" sort="Fomina, Y" uniqKey="Fomina Y" first="Y." last="Fomina">Y. Fomina</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000183 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000183 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Pascal:14-0004028
   |texte=   Text Detection in Chart Images
}}

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024

	Serveur d'exploration sur l'OCR
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur l'OCR

Text Detection in Chart Images

Text Detection in Chart Images

Source :

Descripteurs français

English descriptors

Abstract

Links toward previous steps (curation, corpus...)

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri